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Abstract 



The problem of finding an optimum using noisy 
evaluations of a smooth cost function arises in 
many contexts, including economics, business, 
medicine, experiment design, and foraging the- 
ory We derive an asymptotic bound E[{xt — 
x*)'^] > 0(i~^/^) on the rate of convergence of 
a sequence {xo,xi, . . .) generated by an unbiased 
feedback process observing noisy evaluations of 
an unknown quadratic function maximised at x* . 
The bound is tight, as the proof leads to a sim- 
ple algorithm which meets it. We further estab- 
lish a bound on the total regret, E[J21-^i{xt — 
x*)^] > 0{t^^^). These bounds may impose prac- 
tical limitations on an agent's performance, as 
0(e^"^) queries are made before the queries con- 
verge to X* with e accuracy. 



1 Introduction 



Finding an input a; to a system so as to optimise 
some property f{x) of the system's output, using 
only noisy measurements, is a ubiquitous prob- 
lem. For instance, in medicine x might be a drug 
dosage and f{x) the probability of a successful out- 
come; in business x might be the price set by a 
manufacturer and f{x) the consequent profit; in 
game theory x might be a strategy and f{x) its re- 
turn; and in evolutionary theory x might be the 
brightness of a bird's plumage and f{x) the conse- 
quent reproductive success. 

When the measurements of f{x) are noise-free 
this is a classical optimisation problem, as stud- 
ied by Gauss. Optimisation theory remains to this 
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day a productive branch of applied mathematics. 
In general, the assumption is made that the func- 
tion to be optimised takes on a simplified form 
in the neighbourhood of its optimum — most often, 
quadratic. The criterion by which we evaluate 
such algorithms is typically the convergence rate 
of its estimate of the location of the optimum, al- 
though the complexity of the algorithm itself can 
also be a consideration. 

Here we consider a situation in which the mea- 
surements of the function are assumed to be noisy. 
A similar situation in which noisy measurements 
of the gradient are available is studied in stochas- 
tic gradient optimisation ( Robbins and Monro, 
1951; Liung. 1977; Widrowetal., 1976). Here 
however we assume that gradient information is 
not available. We further assume that we are in- 
terested not in our estimate of the optimum con- 
verging as rapidly as possible, but rather in the 
queries themselves converging to the optimum as 
rapidly as possible. As a practical matter, the 
convergence of the queries themselves is impor- 
tant when the function f{x) is a measure of con- 
sequence, and making a measurement at x has an 
actual expected cost of f{x), as in measuring the 
survival rate of a medical treatment or the return 
of an economic decision. 

Gradient information would make this problem 
much easier. For illustration, consider two closely 
related optimisation problems. In each, an inac- 
curate rifle with unknown bias can be swivelled 
horizontally, and we wish to swivel it so as to max- 
imise the probability of hitting a small target. Due 
to the inaccuracy of the riffle and the small target 
size, we are unlikely to hit the target even when 
the rifle is aimed optimally. In one situation, we 
know after each shot whether the bullet went to 
the left or the right of the target. In the other sit- 
uation, we know only whether the bullet hit the 



target. Knowing whether the bullet went to the 
right or the left of the target corresponds to having 
an estimate of the gradient, and allows rapid con- 
vergence to the correct position by simply making 
successively smaller adjustments after each shot 
away from the side to which the bullet missed. But 
without this gradient information, it is difficult to 
know in which direction to adjust the aim in re- 
sponse to a miss. In fact, a single miss in isolation 
does not seem of any help in improving the aim. It 
is our goal here to precisely characterise the diffi- 
culty of such situations. 



2 Proof Sketch 



We construct an inequality which establishes a 
lower bound on the rate of convergence of the 
queries Xt to the optimum x*. The inequality fol- 
lows from the observation that if the queries Xt 
are more spread out, the estimate of the optimum 
X* will have less uncertainty. This relationship, in 
which faster convergence of the queries leads to 
slower convergence of the estimate of x* , is quan- 
tified using the statistical notion of the leverage of 
the data, which limits the accuracy of an estimate 
of a slope. This gives a lower bound on the speed 
with which the queries xt can converge to x* . Vi- 
olation of the bound would imply a contradiction: 
that the queries converge to the optimum faster 
than does the best estimate of the optimum. 



3 Detailed Derivation 



We consider an unbiased feedback system which 
uses noisy measurements to find the x which max- 
imises f{x), where f{x) is locally quadratic about 
its maximum x*. To simplify the derivation we 
will assume that f{x) is not merely locally but 
globally quadratic 
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that the quadratic coefficient a > is known leav- 
ing unknown only the linear £uid constant terms b 
and c, and that each noisy measurements of f{x) 



is corrupted by zero-mesui i.i.d. additive noise of 
varismce cr^. 

Let xq.xi , ... be the sequence of points evaluated. 
We establish the following bound: 

Theorem 1 For sufficiently large t and an unbi- 
ased feedback process that calculates xt using in- 
formation available prior to t, 



E[{xt-x*r]> 



-1/2 



(2) 



Proof: Since a is known we can add aXf to the 

measurements and fit b and c to the resulting 
noisy line. The variance of bt, the best unbiased 
estimate of b given measurements made prior to 
time t, is limited by the Cramer-Rao bound which 
depends on the level of measurement noise and 
the leverage about the sample mean xt = {xq -\- 
xi H \-Xt-i)/t, 



varbt — 



J2ixr - Xtf 



(3) 



This leverage is bounded by the leverage about 
any point; here we choose x*, the desired point of 
convergence. 



T<t 
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var 6/: > 



(4) 



(5) 



J2{Xr-X*f 

Because x* = b/2a the variance of an estimate of 
X* is related to the variance of an estimate of b. 



var Xf — — - var bt 



(6) 



where x^ is the best unbiased estimate of x* given 
measurements made prior to t. By definition 5* 
cannot be a worse estimate oi x* than is xt, and 
we have already seen a bound on the quality of 



the estimate x^ , so 



E[{xt-x*f]>yaxxl > 



(7) 
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where the expectation E[-] is taken over reahsa- 
tions of the measurement noise. 

We now assume^ that xt convergences polynomi- 
ally, E[{xt - x*)^] = {kf )'^, and substitute this 
above to find r and k. The leverage about x* can 
be evaluated. 



E[£ix^-xr]=k'Yl 
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(8) 



Eq.|8lcan be substituted into the two-sided bound 
on var in Eq. [TJ yielding 



or 



4 ^ ^'(l + 2r) ^_a+4..) 



(9) 



Proof: Summation of the bound on instantaneous 
regret. 

Note: The expected regret bound is independent 
of the constant of curvature a, whose effect can- 
cels itself out in the analysis. This is necessarily 
the case, because we could define f{x) = /(lOOx) 
and an attempt to optimise / {x) should yield the 
same regret as an attempt to optimise f{x), de- 
spite their differing curvatures. 

Theorem 2 (Optimal Algorithm) The stochas- 
tic algorithm 



Xt = xl +7V((stderrX()P) 



(13) 



This can only be satisfied if the right hand side is 
bounded, which implies that r > —1/4, and hence 

E[{xt--x*f]>0{t-^'^) (10) 

The most aggressive convergence is for r = — 1/4, 
at which point equality is achieved when fc^ = 
a-/(\/8 a). Substituting yields Eq.|2j 

Corollary 1 (Bound on Instantaneous Regret) 

The expected instantaneous regret (loss incurred 
at time t due to ignorance) of an unbiased online 
optimiser is bounded below in expectation by 



E[f{x*)-fixt)]>^t-'/' 



(11) 



Proof: Note that f{x*) - f{x) = a{x - x*)^ and 
substitute into Theorem [2 



is unbiased and with p = 2 achieves E[{xt~x*)'^] ^ 
(V2cr/a)t-i/2 ^ij^^] _ crVst", where Af{<^^) is 

zero-mean -variance i.i.d. noise and stderri^ is 
the standard error of the unbiased estimator x^. 

Proof: The algorithm involves only unbiased es- 
timates and is therefore unbiased. 

The inequalities above become equalities when 

Xt=xl+N{V2aar^/^) (14) 

which has the same injected variance (up to ab- 
sorbed constant factors) as in the proposed algo- 
rithm. 

Note: The existence of this algorithm implies that 
the earlier bounds are tight. Interestingly, the 
algorithm does not require knowledge of a or a, 
which are used only in the analysis. Due to the 
statistics of the situation, stderrxj scales appro- 
priately with a and a. 



Corollary 2 (Bound on Total Regret) The to- 
tal regret prior to time t, defined by Rt = 
"^Ztki /(^*) ~ fi^r), incurred by an unbiased feed- 
back process is bounded below in expectation by 

E[Rt] > ^^1/2 (12) 



^If the fastest possible convergence bound were not of this 
form then we would obtain a valid bound, but not a tight one. 
However, we constructively show that the bound obtained is 
tight. 



4 Discussion 

Although the above theorems all assume unbiased 
estimates, integration of prior information would, 
assuming that the prior is smooth, only change an 
initial transient response of the system, leaving 
the asymptotic behaviour unchanged. The limits 
on regret would change by only a small additive 
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xt = x; +7V((stderrx*)^-'^) Greedy: Xt = xt 




Figure 1: Total regret as a function of time for 100 overlaid runs of the algorithm of Theorem|2l(bottom 
left) which optimally trades off exploration and exploitation; with p = 0.8 for more query noise (bottom 
right) resulting in less between-run variation but more regret; with p = 3.6 for less query noise (top 
left) resulting in more between-run variation; and for the greedy strategy, zero query noise (top right) 
in which runs rapidly converge to incorrect estimates. All runs used = a= l,6 = c = 0, and were 
initialised with two queries at x = x* ± 1. 



constant whose value would dependant upon the 
details of the prior. 

The above exploration/exploitation tradeoff and 
bound holds when using noisy measurements and 
the cost of an evaluation is the value of the func- 
tion being optimised. The result is robust, in 
that small changes to the model (a cost function 
quadratic only in the neighbourhood of the opti- 
mum, for instance) will not change their charac- 
ter. 



However a related situation, finding the zero x* 
of a linear function using noisy measurements 
where the expected loss of a measurement xt 
is quadratic in xt — x* , has a surprisingly dif- 
ferent result. In this matching-shoulders lob- 
pass case formalised by Abe and Takeuchi (1993.) 
based on the fo raging theory question posed by 
iHerrnsteinl ( Il99 0). a convergence rate of E[{xt — 
x*Y] = 0{t~^) and thus an expected regr et of 
E\Rf] = 0(\o^t) can be achiev ed ( Kilian et aLl 
1994; 'Hira oka and AmariL Il998l: [^keuchi et al.L 
2000.) . This is because the measurements in that 
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Figure 2: Bar graph (log scale) of total regret after 10^ queries, averaged over 100 runs, for the al- 
gorithm of Theorem 121 with a — 1 and a 1. Bars shown for values of p both above and below the 
optimal p = 2, and also for the greedy algorithm of zero injected noise. Risers show sample standard 
deviations. 



setting serve the purpose of gradient information. 

Procedures which do not insert sufficient variabil- 
ity into their queries acquire only finite leverage, 
resulting (with probability one) in convergence to 
a non-optimum. This is seen in the upper simula- 
tions of Fig. [3 The minimal total regret in Fig.|2lis 
for an algorithm injecting slightly less query than 
stderr . This is due to the slight additional lever- 
age caused by fluctuation of the estimate over 
time. 

Some procedures used in practise for problems of 
this character appear to attempt to exceed the 
convergence bound established here, for instance 
in medical treatment optimisation. The above 
bounds should serve as a caution concerning the 
ease with which a seemingly reasonable optimi- 
sation procedure can converge to a non-optimum. 
In the setting considered here, when insufficient 
query variance is used convergence to a non- 
optimum occurs, and standard statistical analysis 
of the ongoing measurements will fail to give any 
hint of a problem. Query variability must be in- 



jected when the setting itself requires it, rather 
than only in response to empirical signs of prema- 
ture convergence. 

In business, the best selling price (which is not 
subject to the above constraint, as noisy gradient 
information is available) should be faster to esti- 
mate than the supply or demand curves, which 
seem potentially subject to this bound. This would 
argue that firms that set their prices by first es- 
timating supply and demand curves may be at 
a disadvantage against those that set prices di- 
rectly. More speculatively, regulatory regimes 
have surprising variability considering that all 
are designed to further similar goals. Legal sys- 
tems have similar diversity. The ultimate cause 
of this variability may be the intrinsic difficulty 
of gradient-free noisy query optimisation. Even 
more speculatively, sexual selection for adaptive 
traits may provide a proxy for gradient informa- 
tion, thus speeding evolution. 
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